NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Differentially private low-dimensional synthetic data from high-dimensional datasets

https://doi.org/10.1093/imaiai/iaae034

He, Yiyun; Strohmer, Thomas; Vershynin, Roman; Zhu, Yizhe (January 2025, Information and Inference: A Journal of the IMA)

Differentially private synthetic data provide a powerful mechanism to enable data analysis while protecting sensitive information about individuals. However, when the data lie in a high-dimensional space, the accuracy of the synthetic data suffers from the curse of dimensionality. In this paper, we propose a differentially private algorithm to generate low-dimensional synthetic data efficiently from a high-dimensional dataset with a utility guarantee with respect to the Wasserstein distance. A key step of our algorithm is a private principal component analysis (PCA) procedure with a near-optimal accuracy bound that circumvents the curse of dimensionality. Unlike the standard perturbation analysis, our analysis of private PCA works without assuming the spectral gap for the covariance matrix.
more » « less
Full Text Available
Extreme singular values of inhomogeneous sparse random rectangular matrices

https://doi.org/10.3150/23-BEJ1699

Dumitriu, Ioana; Zhu, Yizhe (November 2024, Bernoulli)

We develop a unified approach to bounding the largest and smallest singular values of an inhomogeneous random rectangular matrix, based on the non-backtracking operator and the Ihara-Bass formula for general random Hermitian matrices with a bipartite block structure. We obtain probabilistic upper (respectively, lower) bounds for the largest (respectively, smallest) singular values of a large rectangular random matrix X. These bounds are given in terms of the maximal and minimal 2-norms of the rows and columns of the variance profile of X. The proofs involve finding probabilistic upper bounds on the spectral radius of an associated non-backtracking matrix B. The two-sided bounds can be applied to the centered adjacency matrix of sparse inhomogeneous Erd˝os-Rényi bipartite graphs for a wide range of sparsity, down to criticality. In particular, for Erd˝os-Rényi bipartite graphs G(n,m, p) with p = ω(log n)/n, and m/n→ y ∈ (0,1), our sharp bounds imply that there are no outliers outside the support of the Marˇcenko-Pastur law almost surely. This result extends the Bai-Yin theorem to sparse rectangular random matrices.
more » « less
Full Text Available
Hierarchical Equivariant Policy via Frame Transfer

Zhao, Haibo; Wang, Dian; Zhu, Yizhe; Zhu, Xupeng; Howell, Owen; Zhao, Linfeng; Qian, Yaoyao; Walters, Robin; Platt, Robert (January 2025, International Conference on Machine Learning)

Full Text Available
Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks

https://doi.org/10.1214/23-AAP2010

Wang, Zhichao; Zhu, Yizhe (April 2024, The Annals of Applied Probability)

Full Text Available
Overparameterized random feature regression with nearly orthogonal data

Wang, Zhichao; Zhu, Yizhe (July 2023, International Conference on Artificial Intelligence and Statistics)

Full Text Available
Global eigenvalue fluctuations of random biregular bipartite graphs

https://doi.org/10.1142/S2010326323500041

Dumitriu, Ioana; Zhu, Yizhe (July 2023, Random Matrices: Theory and Applications)

We compute the eigenvalue fluctuations of uniformly distributed random biregular bipartite graphs with fixed and growing degrees for a large class of analytic functions. As a key step in the proof, we obtain a total variation distance bound for the Poisson approximation of the number of cycles and cyclically non-backtracking walks in random biregular bipartite graphs, which might be of independent interest. We also prove a semicircle law for random [Formula: see text]-biregular bipartite graphs when [Formula: see text]. As an application, we translate the results to adjacency matrices of uniformly distributed random regular hypergraphs.
more » « less
Full Text Available
Algorithmically Effective Differentially Private Synthetic Data

He, Yiyun; Vershynin, Roman; Zhu, Yizhe (July 2023, 36th Annual Conference on Learning Theory)

We present a highly effective algorithmic approach for generating ε-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance.
more » « less
Full Text Available
Overparameterized Random Feature Regression with Nearly Orthogonal Data

Wang, Zhichao; Zhu, Yizhe (April 2023, Proceedings of Machine Learning Research)
Ruiz, Francisco; Dy, Jennifer; van_de_Meent, Jan-Willem (Ed.)
We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.
more » « less
Full Text Available
Spectral gap-based deterministic tensor completion

Harris, Kameron D; Lopez, Oscar; Read, Angus; Zhu, Yizhe (July 2023, 2023 14th International conference on Sampling Theory and Applications (SampTA))

Full Text Available
Shifted Diffusion for Text-to-image Generation

https://doi.org/10.1109/CVPR52729.2023.00979

Zhou, Yufan; Liu, Bingchen; Zhu, Yizhe; Yang, Xiao; Chen, Changyou; Xu, Jinhui (June 2023, IEEE)

Full Text Available

« Prev Next »

Search for: All records